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While  experimentation  is  an  integral  aspect  of  the  capability  development  and  acquisition 
process,  its  methods  may  be  less  familiar  to  testers.  This  article  provides  a  framework  for 
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An  article  on  experiment  techniques^ 
should  be  an  interesting  read  for  this 
audience  of  testers.  When  asking  test 
engineers  and  analysts  whether  testing 
and  experimenting  are  similar  activi¬ 
ties,  about  half  might  agree  they  are  similar.  A  similar 
question  to  experimenters  located  in  Service  battle  labs 
would  find  far  fewer  considering  test  and  experiment 
similar.  The  U.S.  Department  of  Defense  (DoD)  has 
differentiated  between  testing  and  experimenting; 
tying  tests  to  the  acquisition  process  and  experiments 
to  the  concept  and  capability  exploration  process.  So  is 
there  a  difference  in  test  and  experiment  techniques? 

The  answer  to  this  question  is  in  two  parts.  Readers 
of  this  journal  are  familiar  with  the  nature  of  testing 
and  test  design.  This  initial  article  will  therefore 
characterize  warfighting  experiments  and  their  design 
requirements.  A  follow-up  article  in  the  next  issue  will 
then  compare  experiments  with  tests. 

Experiments  and  the  capability 
development  process 

Tests  are  conducted  on  early  capability  modules, 
subsystems,  prototypes,  and  production  items  to  quan¬ 
tify  the  degree  of  design  success.  Experiments  are  also 
employed  throughout  this  process.  Experiments  provide 
a  scientific  empirical  method  to  identify  capability  gaps, 
explore  alternative  solutions,  and  develop  and  continu¬ 
ously  update  implementation  techniques. 

Prior  to  initialization  of  a  capability  development 
process,  early  experiments  identify  future  warfighting 
gaps  and  assess  relative  merits  of  proposed  doctrine, 
organization,  training,  materiel,  leadership,  personnel, 
and  facilities  (DOTLMPF).  Analyses  of  alternatives 


(AOA)  include  experiments  conducted  with  combat 
simulations. 

Early  in  the  acquisition  process,  experiments  com¬ 
pare  alternative  designs  and  alternative  competing 
solutions.  Later,  prior  to  testing  of  early  prototypes, 
experiments  assist  combat  developers  in  assessing  new 
tactics,  techniques,  and  procedures  (TTP)  required  for 
optimizing  employment  of  the  new  capability.  After 
capability  fielding,  warfighting  experiments  can  con¬ 
tinuously  examine  opportunities  to  further  enhance 
capability  employment  as  environments  and  threats 
evolve. 

Definition  of  a  warfighting  experiment 

In  its  simplest  formulation,  to  experiment  is  to  try. 
In  this  sense,  experimentation  is  a  characteristic  of 
human  nature  and  has  existed  from  earliest  times. 
When  early  humans  attempted  different  ways  to  chip 
stone  into  cutting  edges  or  selected  seeds  to  grow 
sturdier  crops,  they  were  experimenting. 

More  formally,  “...to  experiment  is  to  explore  the 
effects  of  manipulating  a  variable.”  (Shadish,  Cook, 
and  Campbell  2002). 

This  definition  captures  the  basic  themes  of  gaining 
new  knowledge  (explore),  doing  something  (manipu¬ 
lating  a  variable),  and  causality  (the  effects).  Based  on 
their  general  definition,  the  author  offers  the  following 
derivatives  for  warfighting  experimentation: 

Warfighting  Experimentation — to  explore  the  ef¬ 
fects  of  manipulating  proposed  warfighting  capabilities 
or  conditions. 

Experiment  cause  and  effect  and  hypotheses 

Identifying  experiments  with  the  investigation  of 
causality  is  the  key  to  understanding  experiments  and 
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1.  Treatment  (A) 

(new  systems,  processes,  etc.) 


4.  Trial  Conditions 

(day,  night,  urban,  etc.) 

.  _ 

I 


2.  Experimental  Unit 

(operators,  units,  etc.) 


3.  Effect  (b) 

(detections,  times,  etc.) 


5.  Analysis 

(comparisons,  etc.) 


Figure  1.  Five  elements  of  an  experiment 


linking  experiments  to  the  transformation  process. 
Causality  is  central  to  the  transformation  process. 
Military  decision-makers  need  to  know  what  to  change 
in  order  to  improve  military  effectiveness.  The 
antecedent  causes  of  effectiveness  must  be  understood 
in  order  to  change  effectiveness.  Effectiveness  is 
improved  by  altering  its  antecedents,  its  causes. 
“Today,  the  key  feature  common  to  all  experiments  is 
still  to  deliberately  vary  something  so  as  to  discover 
what  happens  to  something  later — to  discover  the 
effects  of  presumed  causes.”  (Shadish,  Cook,  and 
Campbell  2002).  The  notion  of  cause  and  effect  is 
inherent  in  the  language  of  experimentation  and  in  its 
basic  paradigm  “let’s  do  this  and  see  what  happens.”  All 
warfighting  innovation  questions  can  be  translated  into 
cause-and-effect  questions  expressed  as:  “does  A  cause 
B?”  Does  the  proposed  military  capability  (A)  produce 
(cause)  an  increase  in  warfighting  effectiveness  (B)? 
This  theme  is  fundamental  to  constructing  the 
experiment  hypothesis: 

If  a  unit  uses  the  new  capability  (A), 

then  it  will  increase  in  effectiveness  (B). 

Hypotheses  are  expectations  about  A  causing  B.  The 
nature  of  experiment  hypotheses  prepares  us  to 
understand  the  five  key  components  common  to  aU 
experiments. 

Five  elements  of  an  experiment 

In  large  experiments  with  many  moving  parts  it  is 
sometimes  difficult  to  see  the  forest  for  the  trees.  All 
experiments — large  or  small,  field  or  laboratory, 
military  or  academic,  applied  or  pure — can  be  de¬ 
scribed  by  five  basic  components  (Cook  and  Campbell 


1979)  as  depicted  in  Figure  7;  and  aU  five  are  related  to 
causality. 

1.  The  treatment,  the  possible  cause  (A),  is  the 
proposed  capability,  the  proposed  solution  that  is 
expected  to  influence  warfighting  effectiveness; 

2.  The  experimental  unit  executes  the  possible  cause 
and  produces  an  effect; 

3.  The  possible  effect  (B)  of  the  treatment  is  the 
result  of  the  trial,  an  increase  or  decrease  in  some 
aspect  of  warfighting  effectiveness; 

4.  The  trial  is  one  observation  of  the  experimental 
unit  employing  the  treatment  (A)  or  its  variation  (-A) 
to  see  whether  effect  (B)  occurs  and  includes  all  of  the 
contextual  conditions  under  which  the  experiment  is 
executed;  and 

5.  The  analysis  phase  compares  the  results  from  one 
trial  to  a  different  trial  to  quantify  the  impact  of  A 
on  B. 

Four  requirements  for  a  valid  experiment 

While  defense  experiment  agencies  have  developed 
lists  of  lessons  learned  and  best  practices^  to  increase 
experiment  rigor  (validity);  experiment  validity  is  rarely 
formally  defined.  The  adjective  valid  is  defined  as 
follows: 

“Valid:  well-grounded  or  justifiable,  being  at  once 
relevant  and  meaningful,  logically  correct.  [Synonyms: 
sound,  cogent,  convincing,  and  telling.]’’ — Merriam- 
Webster  Dictionary  online,  2006 

When  this  definition  is  combined  with  the  notion  of 
cause-and-effect,  a  definition  of  a  valid  experiment  is 
apparent:  A  valid  experiment  provides  sufficient 
evidence  to  make  a  conclusion  about  the  truth  or 
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Hypothesis:  If  A,  then  B 


Requirement 

Evidence  of  Validity 

Threat  to  Validity 

1 

Ability  to  use  the  new 
capability. 

A  occurred. 

The  asset  did  not  work  or  was 
not  used. 

2 

Ability  to  detect 
change. 

B  changed  as  A  changed. 

Too  much  noise.  Cannot  detect 
any  change. 

3 

Ability  to  isolate  the 
reason  for  the  change. 

A  alone  caused  B. 

Alternate  explanations  for  the 
change  are  available. 

4 

Ability  to  relate  results 
to  actual  operations. 

Change  in  B  due  to  A  is  ex¬ 
pected  in 

The  obseiwed  change  may  not 
be  applicable. 

Figure  2.  Four  requirements  for  a  good  (valid)  experiment 


falsity  of  the  causal  relationship  between  the  manip¬ 
ulated  variable  and  its  effect. 

How  does  one  design  an  experiment  to  ensure 
sufficient  validity?  All  of  the  good  practices  for 
designing  warfighting  experiments  can  be  organized 
under  four  logically  sequenced  requirements^  that 
must  be  met  to  achieve  a  valid  experiment  (Figure  2). 
A  simple  example  will  illustrate  these  four  require¬ 
ments.  Suppose  a  capability-gap  analysis  postulates 
that  new  sensors  are  required  to  detect  time-critical 
targets.  An  experiment  to  examine  this  proposition 
might  be  a  2-day  military  exercise  in  which  the  current 
array  of  sensors  is  employed  on  the  first  day  and  a  new 
sensor  suite  is  used  on  day  two.  The  primary  measure 
of  effectiveness  is  the  percent  of  targets  detected.  The 
hypothesis  is:  “If  new  sensors  are  employed,  then  time- 
critical  target  detections  will  increase.”  This  experi¬ 
ment  is  designed  to  determine  whether  the  new  sensors 
(A)  will  cause  an  increase  in  detections  (B). 

Ability  to  use  the  new  capability 

In  most  warfighting  experiments,  the  majority  of 
resources  and  effort  are  expended  to  bring  the  new 
experimental  capability  to  the  experiment.  In  the  ideal 
experiment,  the  experimental  capability  (the  new  sensor) 
is  employed  by  experiment  players  to  its  optimal 
potential  and  allowed  to  succeed  or  not  succeed  on  its 
own  merits.  Unfortunately,  this  ideal  is  rarely  achieved 
in  experiments.  It  is  almost  a  truism  that  the  principal 
lesson  learned  from  a  majority  of  experiments  is  that  the 
new  capability,  not  withstanding  aU  effort  expended, 
was  not  ready  for  the  experiment. 

The  experimental  capability  may  not  be  ready  for  a 
number  of  reasons.  The  hardware  or  software  does  not 


perform  as  advertised.  The  experiment  players  are 
undertrained  and  not  fuUy  familiar  with  its  function¬ 
ality.  Because  it  is  new,  techniques  for  optimum 
employment  are  not  mature  and  by  default,  will  be 
developed  by  the  experimental  unit  during  the  initial 
experiment  trials.  If  the  experimental  sensors  (A) 
cannot  be  functionally  employed  during  the  experi¬ 
ment,  there  is  no  reason  to  expect  they  will  detect 
targets  (B)  more  often  than  the  current  array  of 
sensors. 

Ability  to  detect  change 

If  the  first  experiment  requirement  is  met,  then 
transition  from  current  to  new  sensors  should  be 
accompanied  by  a  change  in  detections  observed.  If 
change  in  detections  does  not  occur,  the  primary 
concern  now  is  too  much  experimental  noise.  Ability  to 
detect  change  is  a  signal-to-noise  problem.  Too  much 
experimental  error  produces  too  much  variability, 
making  it  difficult  to  detect  change.  Many  experiment 
techniques  are  designed  to  reduce  experiment  varia¬ 
tion:  calibrating  instrumentation  to  reduce  data 
collection  variation,  limiting  stimuli  (targets)  presen¬ 
tation  to  only  one  or  two  variations  to  reduce  response 
(detections)  variation,  and  controlling  external  envi¬ 
ronment  variations  (time  of  day,  visibility,  etc.). 
Sample  size  also  affects  the  signal-to-noise  ratio. 
Computation  of  statistical  error  variability  decreases 
as  the  number  of  observations  increases. 

Ability  to  isolate  the  reason  for  change 

Let  us  suppose  the  experimenter  meets  the  first  two 
requirements:  the  new  sensors  are  effectively  employed 
and  the  experiment  design  reduces  variability  and 
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produces  an  observable  change  (increase)  in  detections. 
The  question  now,  is  the  detected  change  due  to  the 
intended  cause  (changing  from  old  to  new  sensors)  or 
due  to  something  else.  The  scientific  term  for  alternate 
explanations  of  experimental  data  is  confounded 
results.  In  this  example,  an  alternate  explanation  for 
any  increased  detections  on  day  two  is  that  it  was  due 
to  a  learning  effect.  The  sensor  operators  may  have 
been  more  adept  at  finding  targets  on  day  two  because 
of  their  experience  with  target  presentations  on  day 
one,  and  consequently,  would  have  increased  target 
detections  on  day  two  whether  the  sensors  were 
changed  or  not.  This  potential  learning  effect  dramat¬ 
ically  changes  the  conclusion  of  the  detected  change. 

Scientists  have  developed  experimental  techniques  to 
eliminate  alternate  explanations  for  observed  change. 
These  include  counter-balancing  the  presentation  of 
stimuli  to  the  experimental  unit,  use  of  placebos  in 
drug  research,  inclusion  of  a  control  groups,  and 
randomizing  participants  between  treatment  groups. 

Ability  to  relate  the  results  to  actual 
operations 

Again,  let  us  suppose  that  the  experiment  is 
successful  in  employing  the  new  capability,  detecting 
change,  and  isolating  the  cause.  The  final  question  is 
whether  experimental  results  are  applicable  to  opera¬ 
tional  forces  in  actual  military  operations.  Experiment 
design  issues  supporting  generalization  include  opera¬ 
tional  realism,  representativeness  of  surrogate  systems, 
use  of  operational  forces  as  the  experimental  unit,  and 
use  of  operational  scenarios  with  a  realistic  reactive 
threat. 

Tradeoffs  in  designing  experiments 

A  fundamental  implication  from  these  four  exper¬ 
iment  requirements  is  that  a  100  percent  valid 
experiment  is  not  achievable.  Tbe  four  experiment 
requirements  cannot  be  fully  satisfied  in  one  experi¬ 
ment.  Satisfying  one  works  against  satisfying  the  other 
three.  Thus,  decisions  need  to  be  made  as  to  which 
validity  requirements  are  to  be  emphasized  in  any  given 
experiment. 

All  experiments  are  a  balance  between  the  four 
validity  requirements.  Precision  and  control  increase 
the  ability  to  detect  and  isolate  change  but  often  lead  to 
decreases  in  ability  to  relate  results  to  actual  operations. 
Experiments  that  emphasize  free  play  and  uncertainty 
in  scenarios  reflect  conditions  found  in  existent 
operations  and  satisfy  external  validity  Requirement 
4,  the  ability  to  relate  results.  Conversely,  experiments 
emphasizing  similar  conditions  with  diminished  free 
play  across  multiple  trials  serve  to  reduce  experiment 
noise  and  confounding,  thus  satisfying  internal  validity 


Requirements  2  and  3,  the  ability  to  detect  and  isolate 
change. 

Validity  priorities  differ  for  any  given  experiment. 
Experimenters  need  to  minimize  the  loss  of  one 
validity  requirement  because  of  the  priority  of  another. 
However,  tradeoff  is  inevitable.  In  settings  where  one 
expects  a  small  effect  and  it  is  important  to  determine 
the  precise  relationship  between  the  experiment 
treatment  and  its  effect,  the  priority  should  be  internal 
validity.  On  the  other  hand,  if  one  expects  a  large  effect 
and  it  is  important  to  determine  if  the  effect  will  occur 
in  the  operational  environment  with  typical  units,  then 
external  validity  is  the  priority. 

Different  warfighting  experiment 
methods  provide  different  strengths 

Warfighting  experiments  can  be  grouped  into  one  of 
four  general  methods:  Analytic  war-game,  construc¬ 
tive,  human-in-the-loop,  and  field  experiments.  The 
experiment  requirements  just  discussed  provide  a 
structure  for  recognizing  the  strengths  and  weaknesses 
of  these  four  experiment  methods.  Relative  strengths 
in  meeting  a  requirement  when  employing  a  particular 
method  is  depicted  by  the  number  of  plus  signs  in 
Figure  3. 

Analytic  war-game  experiments  typically  employ 
command  and  staff  officers  to  plan  and  execute  a 
military  operation.  At  certain  decision  points,  tbe  Blue 
players  give  their  course  of  action  to  a  neutral.  White 
Cell,  which  then  allows  the  Red  players  to  plan  a 
counter  move,  and  so  on.  The  White  Cell  adjudicates 
each  move  using  simulations  to  help  determine  the 
outcome.  T3q3ical  war-game  experiments  might  involve 
fighting  the  same  campaign  twice,  using  different 
capabilities  each  time.  The  strength  of  war-game 
experiments  resides  in  the  ability  to  detect  any  change 
in  the  outcome,  given  major  differences  in  the 
strategies  used.  Additionally,  to  the  extent  that 
operational  scenarios  are  used  and  actual  military  units 
are  players,  war-game  experiments  may  reflect  real- 
world  possibilities.  A  major  limitation  is  the  inability 
to  isolate  the  true  cause  of  change  because  of  the 
myriad  differences  found  in  attempting  to  play  two 
different  campaigns  against  a  similar  reactive  threat. 

Constructive  simulation  experiments  reflect  the 
closed-loop,  force-on-force  simulation  employed  by 
the  modeling  and  simulation  community.  In  a  closed- 
loop  simulation,  no  human  intervention  occurs  in  the 
play  after  designers  choose  the  initial  parameters  and 
then  start  and  finish  the  simulation.  Constructive 
simulations  allow  repeated  replay  of  the  same  battle 
under  identical  conditions  while  systematically  varying 
parameters:  insertion  of  a  new  weapon  or  sensor 
characteristic,  employment  of  a  different  resource  or 
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Figure  3.  Different  experiment  venues  have  different  strengths 


tactic,  or  encounter  of  a  different  threat.  Constructive 
simulation  experiments  with  multiple  runs  are  ideal  to 
detect  change  and  to  isolate  its  cause.  Because 
modeling  complex  events  requires  many  assumptions, 
critics  often  question  the  applicability  of  constructive 
simulation  results  to  operational  situations. 

Human-in-the-loop  virtual  experiments  are  a  hlend 
of  constructive  experiments  and  field  experiments.  In  a 
command  and  control  human-in-the-loop  warfighting 
experiment,  a  military  staff  receives  real-time,  simu¬ 
lated  sensor  inputs,  makes  real-time  decisions  to 
manage  the  hattlespace,  and  directs  simulated  forces 
against  simulated  threat  forces.  The  use  of  actual 
military  operators  and  staffs  allows  this  type  of 
experiment  to  reflect  warfighting  decision-making 
better  than  purely  closed-loop  constructive  experi¬ 
ments.  However,  humans  often  play  differently  against 
computer  opponents  than  against  real  opponents. 
Additionally,  when  humans  make  decisions,  variability 
increases,  and  changes  are  more  difficult  to  detect. 

Field  experiments  are  war-games  conducted  in  the 
actual  environment,  with  actual  military  units  and 
equipment  and  operational  prototypes.  As  such,  the 
results  of  these  experiments  are  highly  applicable  to 
real  situations.  Good  field  experiments,  like  good 
military  exercises,  are  the  closest  thing  to  real  military 
operations.  A  major  advantage  of  the  previous  three 


experiment  venues  is  their  ability  to  examine  capabil¬ 
ities  that  do  not  yet  exist  by  simulating  those 
capabilities.  Field  experiments,  on  the  other  hand, 
require  working  prototypes  of  new  capabilities.  Inter¬ 
estingly,  while  field  experiments  provide  the  best 
opportunity  to  examine  practical  representations  of 
these  new  capabilities,  field  experiments  are  the  most 
difficult  environment  to  employ  a  new  capability — the 
new  capability  has  to  function  and  the  operators  need 
to  know  how  to  employ  it.  Difficulties  also  reside  in 
detecting  change  and  isolating  the  true  cause  of  any 
detected  change  because  multiple  trials  are  seldom 
conducted  in  field  experiments  and  the  trial  conditions 
include  much  of  the  uncertainty,  variability,  and 
challenges  of  actual  operations. 

Employing  a  campaign  of  experiments  to 
increase  validity 

Since  a  single  experiment  method  cannot  satisfy  aU 
four  requirements,  a  comprehensive  experiment  cam¬ 
paign  is  required.  A  campaign  of  experiments^  can 
consist  of  a  number  of  successive,  individual  experiments 
to  fuUy  examine  proposed  solutions  to  complex  mihtary 
problems.  It  can  also  consist  of  a  set  of  experiments 
conducted  in  parallel  with  information  and  findings 
passed  back  and  forth.  A  campaign  of  experiments  can 
accumulate  validity  across  the  four  requirements. 
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Figure  4.  Experiment  campaign  requirements  during  the  capability  deveiopment  process 


Emphasizing  different  experiment 
requirements  throughout  the  capability 
development  process 

A  comprehensive  capability-development  program 
should  include  a  campaign  of  individual  experiments 
that  emphasize  different  experiment  requirements. 
Figure  4  illustrates  one  example.  The  campaign  starts 
at  the  top  with  discovery  activities  and  proceeds  to  the 
bottom  with  capability  implementation  into  the  joint 
force.  Each  step  in  the  campaign  identifies  possible 
experimentation  goals.  On  the  right  of  the  experiment 
goals,  the  “pluses”  portray  the  relative  importance  of 
the  four  validity  requirements  for  that  experimentation 
step.  The  following  discussion  identifies  possible 
experiment  venues  that  can  be  employed  at  each 
capability-development  step  to  address  the  goals  and 
validity  requirements. 

The  primary  consideration  during  concept  discovery 
is  relevance  and  comprehensiveness.  To  what  extent  do 
initial  articulations  of  future  operational  environments 
include  a  comprehensive  description  of  expected 
problems  along  with  a  full  set  of  relevant  proposed 
solutions?  Relevancy,  however,  should  not  be  over¬ 
stressed.  It  is  important  to  avoid  eliminating  unantic¬ 
ipated  or  unexpected  proposals  that  subsequent  exper¬ 
iments  could  investigate  further. 

Finding  an  initial  set  of  potential  capabilities  that 
empirically  show  promise  is  most  important  in  concept 


refinement.  Early  experiments  here  examine  idealized 
capabilities  (future  capabilities  with  projected  character¬ 
istics)  to  determine  whether  they  lead  to  increased 
effectiveness.  Initial  concept  refinement  experiments  are 
dependent  on  simulations  to  represent  simulated  capa¬ 
bilities  in  simulated  environments.  Accurately  isolating 
the  reason  for  change  is  less  critical  at  this  stage  in  order 
to  permit  “false  positives.”  Allowing  some  false  solutions 
to  progress  and  he  examined  in  later  experiments  under 
more  reahstic  environments  is  more  important  than 
eliminating  potential  solutions  too  quickly.  Concept 
refinement  is  dependent  on  the  simulation-supported 
experiment  such  as  constructive,  analytic  war-game,  and 
human-in-the-loop  experiments. 

Quantifying  operational  improvements  and  correctly 
identifying  the  causative  capabilities  are  paramount  in 
providing  evidence  for  concept  assessment.  Concept 
justification  is  dependent  on  experiments  with  better- 
defined  capabilities  across  multiple  environments. 
Constructive  experiments  can  provide  statistically 
defensible  evidence  of  improvements  across  a  wide 
range  of  conditions.  Human-in-the-loop  and  field 
experiments  with  realistic  surrogates  can  provide  early 
evidence  for  capability  usability  and  relevance.  Incor¬ 
porating  human  decision-makers  into  human-in-the- 
loop  and  field  experiments  is  also  essential  early  in  the 
capability-development  process.  Human  operators  tend 
to  find  new  ways  to  solve  problems. 
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Pre-Model 


CONSTRUCTIVE  SIMULATION 


Exercise  (Wargame) 


Examine  capabilitj'  trade-offs 
in  multiple  scenario  paths 

Optimize  Exercise 
Productivity 

•  Identify  critical  capabilities  and 
scenario  decision  points  to  ob¬ 
serve 


+  Real  staff/operators 
+  Reactive  threat 
"  Single  trial  (repeatability?) 
-  No  comparisons 
“  No  analytic  results 


Post-Model 

CONSTRUCTIVE  SIMULATION 

Conduct  sensitivity  excursion 

Increase  Rigor 

•  Isolate  why  results  occurred 

•  Examine  result  repeatability 

Increase  Applicability 

•  Quantify  impact  of  capabilities 
and  decisions 


Figure  5.  Model-exercise-model  process 


In  prototype  refinement,  one  should  anticipate  large 
effects  or  the  implementation  might  not  be  cost 
effective.  Accordingly,  the  experiment  can  focus  on 
the  usability  of  working  prototypes  in  a  realistic 
experiment  environment.  To  do  this,  the  experiment 
must  be  able  to  isolate  the  contributions  of  training, 
user  characteristics,  scenario,  software,  and  operational 
procedures  to  prototype  improvements  in  order  to 
refine  the  right  component.  Human-in-the-loop  and 
field  experiments  with  realistic  surrogates  in  realistic 
operational  environments  provide  the  experimental 
context  for  assessing  gains  in  effectiveness.  Human 
operators  find  unexpected  ways  to  employ  new 
technologies  effectively. 

Applicability  to  the  warfighting  operational  envi¬ 
ronment  is  paramount  in  prototype  assessment.  If  the 
capability  is  difficult  to  use  or  the  desired  gains  are  not 
readily  apparent  in  the  operational  environment,  it  will 
be  difficult  to  convince  combatant  commanders  to 
employ  it.  Uncovering  exact  causal  chains  is  less 
important  while  human  operators  are  essential  to 
ensuring  that  the  new  technology  can  be  employed 
effectively.  Prototype  assessment  experiments  are  often 
embedded  within  joint  exercises  and  operations. 

Emphasizing  different  experiment 
requirements  via  a  model-exercise- 
model  process 

Another  type  of  experiment  campaign  can  be 
organized  around  the  requirement  to  conduct  large 
war-games  or  large  field  exercises  to  investigate  the 
effectiveness  of  new  capabilities.  Because  these  large 
events  are  player  resource  intensive  and  often  include 
multiple  experimental  capabilities,  few  opportunities 
exist  to  examine  disentangled  alternative  capabilities  or 


alternative  situations  that  would  allow  meaningful 
comparisons.  The  model-exercise-model  paradigm 
depicted  in  Figure  5  can  enhance  the  usefulness  of 
war-games  and  exercises.  This  paradigm  consists  of 
conducting  early  constructive  simulation  experiments 
prior  to  the  war-game  or  exercise  and  then  following 
these  events  with  a  second  set  of  postexercise 
constructive  experiments. 

Early  constructive  simulation  experiments  use  the 
same  Blue  and  Red  forces  anticipated  to  be  played  in 
the  exercise.  This  pre-event  simulation  examines 
multiple  alternative  Blue  force  capability  configura¬ 
tions  against  different  Red  force  situations.  This  allows 
experimenters  to  determine  the  most  robust  Blue  force 
configuration  across  the  different  Red  force  scenarios. 
It  also  helps  to  focus  the  exercise  by  pinpointing 
potential  critical  junctures  to  be  observed  during  the 
follow-up  exercise. 

The  war-game  or  exercise  executes  the  best  Blue  force 
configuration  identified  during  the  pre-event  simula¬ 
tion.  The  “best  configuration”  is  the  one  indicated  by 
pre-exercise  simulation  that  the  new  capability  dramat¬ 
ically  improved  Blue’s  outcome.  The  exercise  reexamines 
this  optimal  configuration  and  scenario  with  indepen¬ 
dent  and  reactive  Blue  and  Red  forces.  Choosing  the 
scenario  that  provides  the  best  opportunity  for  the  new 
capabilities  to  succeed  is  best  because  large  exercises 
include  the  “fog  of  war” — and  experimental  capabilities 
rarely  perform  as  well  in  the  real  environment  as  in 
simulation.  Therefore,  it  makes  sense  to  give  the  new 
capability  its  best  chance  to  succeed.  If  it  does  not 
succeed  in  a  scenario  designed  to  allow  it  to  succeed,  it 
most  likely  would  not  succeed  in  other  scenarios. 

Experimenters  use  the  exercise  results  to  calibrate 
the  original  constructive  simulation  for  further  poste- 
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vent  simulation  analysis.  Calibration  involves  adjusting 
simulation  inputs  and  parameters  to  better  match  the 
play  of  the  simulation  to  the  play  of  the  exercise.  This 
adds  credibility  to  the  simulation.  Rerunning  the  pre¬ 
event  alternatives  in  the  calibrated  model  provides  a 
more  credible  interpretation  of  differences  now  ob¬ 
served  in  the  simulation.  Additionally,  the  postevent 
calibrated  simulation  can  substantiate  (or  not)  the 
implications  of  the  exercise  recommendations  by 
conducting  causal  analysis.  Causal  analysis  is  a  series 
of  “what  if’  sensitivity  runs  in  the  simulation  to 
determine  whether  the  exercise  recommendations 
make  a  difference  in  the  calibrated  simulation  out¬ 
come.  Postexercise  simulation  runs  can  also  examine 
what  might  have  occurred  if  the  Red  or  Blue  forces  had 
made  different  decisions  during  the  exercise. 

Summary 

Can  experiments  fail?  Yes,  they  can  fail  to  provide 
sufficient  evidence  to  determine  whether  the  manip¬ 
ulated  variable  does  (or  does  not)  cause  an  effect.  If  the 
experimenter  is  unable  to  answer  each  of  the  four 
requirements  in  a  positive  manner,  a  meaningful 
conclusion  is  not  possible  concerning  the  impact  of  a 
proposed  capability. 

Designing  individual  warfighting  experiments  is  an 
art  because  every  experiment  is  a  compromise.  The 
logical  approach  in  this  article  provides  an  understand¬ 
ing  of  the  choices  available  to  meet  the  four  experiment 
validity  requirements  and  the  strengths  and  weaknesses 
inherent  in  typical  experiment  venues.  Designing  an 
individual  experiment  involves  making  cognizant 
tradeoffs  among  the  four  requirements  to  provide 
sufficient  credible  evidence  bounded  by  explicated 
limitations  to  resolve  the  hypothesis. 

While  a  single  experiment  will  not  satisfy  all  four 
requirements,  a  campaign  of  experiments  can  accumu¬ 
late  validity  and  overall  confidence  in  experiment 
results.  A  comprehensive  experiment  program  includes 
a  series  of  individual  experiments,  each  emphasizing 
different  experiment  requirements.  In  this  campaign, 
no  single  experiment  is  expected  to  carry  the  entire 
weight  of  the  decision.  Each  experiment  contributes 
and  the  final  results  are  based  on  accumulated 
confidence  with  each  individual  experiment  contribut¬ 
ing  its  strength  to  the  final  conclusions.  The  whole  is 
greater  than  any  part. 

So,  how  much  of  this  is  applicable  to  acquisition 
testing?  The  foUow-up  article  in  the  next  issue  will 
discuss  the  similarities  and  difference  between  tests  and 
experiments  in  several  areas:  The  planning  process — 
especially  designing  valid  tests  and  experiments — along 


with  the  execution  and  reporting  process.  The  next 
article  will  focus  on  clearing  away  misperceptions  of 
where  efficiencies  could  be  gained  by  sharing  resources 
and  expertise.  □ 
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Endnotes 

^This  article  draws  heavily  from  portions  previously  printed  in  my  book 
Kass,  R.  A.  The  Logic  of  Warfighting  Experiments  published  in  2006  by 
the  Command  and  Control  Research  Program  (CCRP)  of  the  ASD/NII 
which  has  graciously  granted  permission  to  include  that  material  in  this 
work.  Figures  1  through  5  here  are  Figures  9,  8,  20,  39,  and  40  in  that 
work.  Readers  can  download  or  order  the  larger  document  from  the 
CCRP  website  at  http://www.dodccrp.org. 

^A  good  discussion  of  many  best-practices  is  found  in  Alberts,  D.  S. 
and  Hayes,  R.  E.  2002  Code  of  Best  Practices  for  Experimentation.  DoD 
CCRP  publication  series,  D.C.:  U.S.  Government  Printing  Office. 

^The  Logic  of  Warfighting  Experiments  devotes  a  separate  chapter  to 
each  of  the  four  validity  requirements. 

"^For  a  comprehensive  examination  of  the  value  of  experiment 
campaigns  to  address  warfighting  problems  see  Alberts,  D.  S.  and  Hayes, 
R.  E.  2005  Campaigns  of  Experimentation.  DoD  CCRP  publication 
series,  D.C.:  U.S.  Government  Printing  Office. 

References 

Cook,  T.  D.  and  Campbell,  D.  T.  1979.  Quasi- 
Experimentation:  Design  and  Analysis  Issues  for  Field 
Settings.  Houghton  Mifflin:  Boston. 

Merriam-Webster  Dictionary  online.  http://www. 
m-w.com 

Shadish,  W.  R.,  Cook,  T.  D.  and  Campbell,  D.  T. 
2002.  Experimental  and  Quasi-Experimental  Designs 
for  Generalized  Causal  Inference.  Houghton  Mifflin: 
Boston.  Page  507. 

Shadish,  W.  R.,  Cook,  T.  D.  and  Campbell,  D.  T. 
2002.  Experimental  and  Quasi-Experimental  Designs 
for  Generalized  Causal  Inference.  Houghton  Mifflin: 
Boston.  Page  3. 


174  ITEA  Journal 


